Fix GitHub API endpoint selection based on host parsing#1850
Fix GitHub API endpoint selection based on host parsing#1850bc-lee wants to merge 4 commits intomicrosoft:mainfrom
Conversation
src/vcpkg/base/downloads.cpp
Outdated
| } | ||
|
|
||
| // Extracts the host part from a URL string. | ||
| std::string extract_host(StringView url) |
There was a problem hiding this comment.
This thing needs unit tests.
There was a problem hiding this comment.
URL parsing is an insanely complicated problem; a parsing function that just returns whether we get github.com or not may be easier. Otherwise we may want to look at using trurl.
There was a problem hiding this comment.
Thanks for the review. I’ll revisit this once the curl library integration has been re-landed.
src/vcpkg/base/downloads.cpp
Outdated
| const char* at_sign = Strings::find_first_of(url, "@"); | ||
| size_t at_pos = at_sign[0] == '\0' ? std::string::npos : static_cast<size_t>(at_sign - url.data()); |
There was a problem hiding this comment.
This is undefined behavior: you can't dereference the returned iterator if there was no match. And there is no requirement that there is a \0 after a StringView
src/vcpkg/base/downloads.cpp
Outdated
| } | ||
|
|
||
| // Remove userinfo if present (e.g., user:pass@host) | ||
| const char* at_sign = Strings::find_first_of(url, "@"); |
There was a problem hiding this comment.
Seems little reason to use find_first_of when there is only one char you're looking for?
|
Ah crap it looks like the URL APIs were added in 7.62.0 which is more recent than we currently assume. So we might be stuck using a handwritten thing like this or possibly copy/pasta-ing the relevant bits from libcurl's urlapi.c (with attribution) |
|
(I guess that most systems GitHub cares about would have greater than 7.62 so that may be fine) |
|
I reviewed your suggestion to use libcurl’s URL API for URL parsing.
While reading https://learn.microsoft.com/en-us/vcpkg/concepts/supported-hosts and https://github.com/microsoft/vcpkg-tool/blob/792f2dba1715f180412eef13b4de55f8085c82ca/cmake/FindLibCURL.cmake, I noticed that vcpkg-tool is currently too permissive about the minimum libcurl version it accepts. The libcurl URL API was introduced in 7.62.0 (released on 2018-10-31), and the lowest supported libcurl version on Linux is 7.74.0, which is already newer than 7.62.0. That means we should be able to use the libcurl URL API within vcpkg-tool’s supported host baseline. If you’re OK with that direction, I’d like to split this work into two PRs:
|
|
@ras0219-msft suggests (and @MahmoudGSaleh agrees): rather than doing the whole URI parsing, can we just check "if it's https://github.com" (just a string match) do the special github thing, otherwise do the existing behavior? That way we don't need to make the curl situation more complicated. (And I'm sorry for not noticing this before when I first reviewed this 😅) |
Previously, any provided GitHub URL was treated as a GitHub Enterprise Server (GHES) instance when selecting the API endpoint. This assumption was incorrect because the URL can also be for GitHub.com (e.g., https://github.com), which led to using the wrong API endpoint. This commit adds a function to extract the host from a URL. This is used to differentiate between GitHub.com and GHES instances to select the correct API endpoint.
79462ea to
da9910c
Compare
|
I simplified the logic significantly to align with your suggestion. Could you please take another look? |
Previously, when
GITHUB_SERVER_URLwas set, vcpkg always treated it as a GitHub Enterprise Server(GHES) endpoint and appended/api/v3forGITHUB_SERVER_URL=https://github.com, this produced an incorrect endpoint (https://github.com/api/v3/...).This change special-cases the exact value
https://github.comand useshttps://api.github.comfor dependency graph submission. All other server URLs keep the existing behavior (<server>/api/v3/...).Added tests for endpoint selection behavior in
src/vcpkg-test/downloads.cpp.Fixes microsoft/vcpkg#48347